The genome of the Gulf pipefish enables understanding of evolutionary innovations 您所在的位置:网站首页 brood pouch 企鹅 The genome of the Gulf pipefish enables understanding of evolutionary innovations

The genome of the Gulf pipefish enables understanding of evolutionary innovations

2023-07-05 10:10| 来源: 网络整理| 查看: 265

Genome sequencing libraries and genome sequence assembly

We isolated genomic DNA from a single adult male pipefish purchased from Gulf Specimen Marine Laboratories, Inc. (Panacea, FL, USA) in 2010 using standard organic extraction. We generated four different 100 nt paired-end Illumina libraries for whole genome shotgun assembly: (1) a short (~180 bp) insert length library; (2) a 2.5–5 kb insert length jumping library; (3) a 5–10 kb insert length jumping library; and (4) a 11–15 kb insert length jumping library. To construct the 180 bp library, we sheared 1 μg of genomic DNA to less than 500 bp using sonication in a Bioruptor (Diagenode) and size selected fragments by agarose gel electrophoresis, followed by end repair of the fragments, addition of adenosine overhangs, ligation of Illumina sequencing adapters, and 12 cycles of PCR amplification with Phusion polymerase (NEB). We used the Illumina Nextera Matepair Sample Preparation Kit (Illumina, cat. #FC-132-1001) to generate the three jumping libraries. Briefly, we performed a single tagmentation reaction using 5 ng of genomic DNA, selected the three aforementioned fragment size ranges using agarose gel electrophoresis, and performed the remaining library preparation steps in parallel, including circularization, shearing by Bioruptor (30 s on, 60 s off, for 15 min), streptavidin bead pull-down, end repair, addition of adenosine overhangs, Illumina indexed adapter ligation, and 15 cycles of PCR amplification. We sequenced the short-insert library (two lanes) and three jumping libraries (all in one lane) on an Illumina HiSeq2000 at the University of Oregon Genomics Core Facility (UOGCF).

To minimize the inclusion of sequencing adaptors, sequencing errors, and repetitive DNA sequences in the assembly process, we used tools from the Stacks software suite [91, 92] to adaptor-trim and discard low-quality read pairs (process_shortreads) and filter pairs containing abundant k-mers (kmer_filter). Remaining were 238.6 million overlap pairs, 3.5 million 11–15 kb mate-pairs, 21.6 M 5–10 kb mate-pairs, and 44.4 M 2.5–5 kb mate-pairs, which we used for assembly with ALLPATHS-LG [21]. Because initial k-mer spectrum analyses suggested a highly polymorphic genome, we ran ALLPATHS-LG with HAPLOIDIFY = TRUE. To assess completeness of the assembly with respect to CEGs, we used CEGMA [22]. For a summary of all Illumina sequencing data used in the assembly, see Additional file 3.

We confirmed several apparent pipefish gene losses via comparison among preliminary genome assemblies derived from independently constructed molecular libraries and generated using SGA [93] and Velvet [94] and via targeted Sanger sequencing. Briefly, SGA and Velvet assemblies incorporated a shotgun genomic DNA library with an insert length of 470 nt, sequenced independently with 120 nt, 100 nt, and 80 nt paired-end Illumina reads. For the SGA assembly, the overlap value was optimized to 70 during the contig construction phase. Scaffolding was performed using SSPACE [95], with the three mate-pair libraries mentioned above and an additional 2–8 kb mate-pair library. These analyses filled seven small gaps in the range of 51–1753 nt in the hoxba, hoxbb, hoxca, and hoxda clusters. The degraded nature of hoxa7a was also confirmed by Sanger sequencing.

RNA-seq libraries and transcriptome assembliesEmbryo and fry transcriptome

Embryos, flushed from the pouch of lab-reared pregnant males, and fry were euthanized in Tricaine-S and stored in RNA-Later (Ambion). Tissue including the head to just posterior to the pectoral fin was dissected and pooled from 17 embryos (including 15 at 8 days post fertilization (dpf) and 2 at 10 dpf) and from 18 fry (including 2 at 16 dpf and 16 at 17 dpf). Double-stranded complementary DNA (cDNA) was produced from these tissues via standard methods including RiboPure Kit (Ambion) for total RNA isolation, MicroPoly(A)Purist Kit (Ambion) for messenger RNA (mRNA) enrichment, mostly hexameric Random Primers (ThermoFisher, #48190-011) and Superscript III reverse transcriptase (Invitrogen) for first strand synthesis, and Random Primers with Kleno exo-DNA polymerase (Epicentre). Paired-end Illumina sequencing libraries were created using standard methods including mechanical shearing of the cDNA and TA ligation of adaptors (top, 5′ACACTCTTTCCCTACACGACGCTCTTCCGATC*T3′; bottom, 5′Phos-GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG3′), slab gel size fractionation to isolate fragments in the 200–500 bp range, and amplification using Illumina-compatible primers (5′AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT3′ and P2 reverse primer, 5′CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT3′). The library was sequenced on an Illumina GAIIx platform to produce 60 nt paired-end reads and on an Illumina HiSeq2000 platform to produce 100 nt paired-end reads (see Additional file 3 for details).

Male brood pouch

Six non-pregnant and six early-stage pregnant adult males were captured from Redfish Bay, TX, USA (Lat: 27.86795057508745, Long: –97.08869218576297), transported to the laboratory, and euthanized as described above approximately 24 h after capture. We carefully dissected all brooding tissues, including the pouch “flaps” and epithelium, but excluding all embryonic tissue in the case of pregnant males. We fixed tissues in RNA-Later (Ambion) before freezing, homogenized by pestle upon thawing, and isolated total RNA using Trizol Reagent (Invitrogen) and RNeasy MinElute columns (Qiagen). A unique RNA-seq library was generated for each individual from 1 ug of total RNA using the TruSeq RNA v2 Kit (Illumina) and the 12 mRNA-seq libraries were sequenced across two lanes of Ilumina HiSeq 2000, generating 100 nt paired-end reads.

De novo transcriptome assemblies

We removed low-quality and adaptor sequences from RNA-seq reads using process_shortreads from Stacks [91, 92], overlapped paired-end reads using FLASH [96], and performed rare k-mer filtering and digital normalization using kmer_filter from Stacks. We then generated two separate de novo transcriptome assemblies (one for each tissue type) from the cleaned, filtered RNA-seq data using Trinity [97] with --min_kmer_cov set to 3.

Genome annotation

Prior to genome annotation, the assembly was soft-masked for repetitive elements and areas of low complexity with RepeatMasker [98] using a custom Gulf pipefish library created by RepeatModeler [99], Repbase repeat libraries [100], and a list of known transposable elements provided by MAKER [25]. In total 15.36% of the genome assembly was masked by RepeatMasker. Repetitive elements were annotated with RepeatModeler. Hidden Markov models (HMMs) for gene prediction were generated by SNAP [101] and Augustus [102] and were iteratively trained for the assembly using MAKER as described by Cantarel et al. [103]. Training was performed on the five largest scaffolds and two additional scaffolds that were UTR rich, totaling 25 Mb. Evidence used by MAKER for annotation included Gulf pipefish mRNA-seq transcriptomes from embryonic head tissue and brood pouch tissue (assembled with Trinity – see above), protein sequences from threespine stickleback (Gasterosteus aculeatus), zebrafish (Danio rerio), medaka (Oryzias latipes), and tilapia (Oreochromis niloticus) (downloaded from Ensembl: Broad S1, GRCz10, HdrR, Orenil1.0, respectively), and all Uniprot/swissprot proteins [104].

We filtered the annotations by MAKER to include evidence-based annotations with assembled transcriptome or protein support and those ab initio gene predictions that contained protein family domains as detected with InterProScan [105]. Gene annotations were manually refined for hox, astacin-like metalloprotease, and pitx genes. For each annotated amino acid sequence we queried the NCBI nr database using BLASTP and compiled the results for the top BLASTP hit per gene in Additional file 2: SH6.

Linkage map and map integrationMapping cross

For the genetic cross, wild male and female S. scovelli were captured from Redfish Bay and maintained in the lab. A total of six sequential broods from a single mated pair, totaling 108 F1 progeny, including fry from the brood pouch plus 15 collected just prior to emergence, were gathered and flash frozen over a span of 4 months. Genomic DNA was isolated from individual progeny and from their parents via the Qiagen DNeasy Kit. RAD-seq libraries were made using the restriction enzyme SbfI as in Baird et al. [106], Hohenlohe et al. [107], and Etter et al. [108] with the Illumina-compatible, barcoded P1 adapters and primer types used in Hohenlohe et al. [109] and the P2 adapter type used in Hohenlohe et al. [107]. Single-end reads of 100 nt were produced from two lanes on an Illumina HiSeq2000 (see Additional file 3 for details). The parents were sequenced to greater depth than progeny (see below) to make an accurate catalog of diploid genotypes possible in the cross.

Marker genotyping

The two lanes of Illumina data resulted in 367,085,475 raw reads which were analyzed using the software, Stacks [91, 92]. Using the process_radtags program, reads were demultiplexed according to barcode and discarded if the barcode could not be determined after correcting for sequencing error, if the restriction enzyme cut site was not intact, or if the sequencing quality was too degraded. The 218,309,324 remaining reads were analyzed by the Stacks de novo pipeline to assemble and genotype the RAD loci. A minimum of three identical reads (–m 3) was required to form a “stack” or putative allele in each individual, up to five differences were allowed when merging stacks into putative loci (–M 5) and up to 3 differences were allowed when merging loci from different individuals into the catalog (–n 3) to accommodate fixed differences between the cross parents. The genotypes program from Stacks was used to export data in a CP cross-format for use in JoinMap and the genotypes were uploaded to the Stacks web interface. Genotype data with markers present in at least 75 of the 108 individual progeny were exported from the web interface for linkage analysis.

Map construction

Linkage analysis was performed with JoinMap 4.1 [110] using only markers that were present in at least 75 of the 108 individual progeny. Markers were initially grouped in JoinMap 4.1 using the “independence LOD” parameter under “population grouping” at a minimum LOD value of 15.0, and markers that remained unlinked at LOD 



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有